identifying and correcting label bias
Identifying and Correcting Label Bias in Machine Learning Lyrn.AI
As machine learning (ML) becomes more effective and widespread it is becoming more prevalent in systems with real-life impact, from loan recommendations to job application decisions. With the growing usage comes the risk of bias – biased training data could lead to biased ML algorithms, which in turn could perpetuate discrimination and bias in society. In a new paper from Google, researchers propose a novel technique to train machine learning algorithms fairly even with a biased dataset. At the heart of the technique is the idea that a biased dataset can be perceived as an unbiased dataset which has gone through manipulation by a biased agent. Using this framework, the biased dataset is re-weighted to fit the (theoretical) unbiased dataset, and only then fed into a machine learning algorithm as training data.
Identifying and Correcting Label Bias in Machine Learning
As machine learning (ML) becomes more effective and widespread it is becoming more prevalent in systems with real-life impact, from loan recommendations to job application decisions. With the growing usage comes the risk of bias -- biased training data could lead to biased ML algorithms, which in turn could perpetuate discrimination and bias in society. In a new paper from Google, researchers propose a novel technique to train machine learning algorithms fairly even with a biased dataset. At the heart of the technique is the idea that a biased dataset can be perceived as an unbiased dataset which has gone through manipulation by a biased agent. Using this framework, the biased dataset is re-weighted to fit the (theoretical) unbiased dataset, and only then fed into a machine learning algorithm as training data.
Identifying and Correcting Label Bias in Machine Learning
Datasets often contain biases which unfairly disadvantage certain groups, and classifiers trained on such datasets can inherit these biases. In this paper, we provide a mathematical formulation of how this bias can arise. We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups. Despite the fact that we only observe the biased labels, we are able to show that the bias may nevertheless be corrected by re-weighting the data points without changing the labels. We show, with theoretical guarantees, that training on the re-weighted dataset corresponds to training on the unobserved but unbiased labels, thus leading to an unbiased machine learning classifier. Our procedure is fast and robust and can be used with virtually any learning algorithm. We evaluate on a number of standard machine learning fairness datasets and a variety of fairness notions, finding that our method outperforms standard approaches in achieving fair classification.
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)